Within performance-based measures, the comparison is reduced to the scalar performance score, where a quality function $q$ evaluates the performance of a model with respect to ground-truth labels.
Then, to measure the similarity, the difference of the performance is calculated:
\[m_{Perf}(\mathbf{O}, \mathbf{O}')=|q(\mathbf{O})-q(\mathbf{O}')|.\]

As an performance metrice we will use F1 score and accuracy.

\textbf{Accuracy} is a measure of how many predictions of a model are correct - in relation to the total number of all predictions \cite{sammut_encyclopedia_2017}.
It is often used in classification problems and is calculated as follows: \[\text{accuracy} = \frac{\text{true positives + true negatives}}{\text{all objects}}.\]
\textbf{F1-score} is the harmonic mean of Precision and Recall.
It gives a better measure of the incorrectly classified cases than the Accuracy Metric \cite{sammut_encyclopedia_2017}.
\[\text{F1-score}=2\cdot \frac{\text{Precision}\cdot \text{Recall}}{\text{Precision} + \text{Recall}},\]
where Precision is the measure of the true positives from all predicted positive cases:\[\text{Precision} = \frac{\text{true positives}}{\text{true positives} + \text{false positives}}\]
and Recall the measure of true positives from all actual positives: \[\text{Recall} = \frac{\text{true positives}}{\text{true positives} + \text{false negatives}}.\]
\textbf{Brier-score} is a measure that measures the mean squared deviation from the ideal probabilities $1$ and $0$.
It is used to evaluate the quality of probability estimater.
\[
\text{BS} := \frac{1}{k} \sum_{x \in \mathcal{D}} (p(x) - b(x))^2,
\]
where $\mathcal{D}$ denotes the evaluation set, $p(x)$ the predicted probability and $b(x)$ the lable.

\textbf{Spearman's rank correlation} provides a measure of how monotonic the relationship between two variables is, or how well a monotonic function can capture the relationship between two variables.
The Spearman's rank correlation $\rho$ measures the strength and direction of association between two ranked variables.

\[
\rho = 1 - \frac{6 \sum_{i=1}^m (x_i - y_i)^2}{m(m^2 - 1)}
\]
Dabei i


%\section{Relation of Intrinsic and Extrinsic Homotopy}
%Within this section, we will focus on the relation of intrinsic and extrinsic homotopy.
%Therefore, we relate $d_\mathcal{C}(h,g)$ and $d_{\mathcal{C}_{V,W}}^\mathcal{H}$

On Homotopy of Similar Neural Networks
using Non-linear Transformations
Ostbayerische Technische Hochschule Regensburg
Faculty of Computer Science and Mathematics
MASTER THESIS
Bettina Zieger
Student ID: 3401509
Study Programme: Master Mathematics
Deadline: 31. July 2025
Supervisor: Prof. Dr. Stefan Körkel
Secondary Supervisor: Prof. Dr. Wolfgang Lauf
External Supervisor: M. Sc. Daniel Kowatsch, Fraunhofer AISEC
7 Improving Similarity Analysis using Nonlinear Transformations
that can be seen as a probability distribution over N . It yields the composition:
pψ : Σ h
V ψ
W softmaxλ
N 1
Let us consider the Lipschitz continuous map softmaxλ : W N 1 with constant L.
We can apply the fact of Lipschitz continuous maps with constant L. For any function
sets h, g ∈ EV the Hausdorff-Hoare map satisfies:
dH
,N 1 (softmax(ψ h), VN (g)) L · dH
,W (CV,W (h), CV,W (g)).
Consequently, the triangle inequation holds for all 1Lipschitz continuous maps if the
family of neural networks VN is 1Lipschitz continuous, because every dH
,N 1 is
bounded by L · dH
,W .
7.3 Performance-Based Similarity Measure
As explained in chapter 5 functional similarity measures compares the outputs O, O
RN ×C . Each element Oi,c denotes the probabilities or scores of a class c for input xi.
That a class ˆc is the prediction for input Xi is indicated by:
arg max
c Oi,c = ˆc.
One of the most famous views on functional similarity is to consider models as similar,
if they reach similar performance on some downstream task. Chan et al. [8] consid-
ered in their definition of extrinsic homotopy the models on any downstream task. By
analysing the similarity of neural networks to a specific downstream task, we can focus
on how comparable the internal representations or decisions of the models are in a
specific application context. This gives us a better insight into the model behaviour, the
transferability of knowledge and the suitability for specific tasks.
Within performance-based measures, the comparison is reduced to the scalar perfor-
mance score, where a quality function q evaluates the performance of a model with
respect to ground-truth labels. Then, to measure the similarity, the difference of the
performance is calculated:
mP erf (O, O) = |q(O) q(O)|.
As an performance metrice we will use F1 score and accuracy.
Accuracy is a measure of how many predictions of a model are correct - in relation to
the total number of all predictions [39]. It is often used in classification problems and is
calculated as follows:
accuracy = true positives + true negatives
all objects .
34
7.3 Performance-Based Similarity Measure
F1-score is the harmonic mean of Precision and Recall. It gives a better measure of
the incorrectly classified cases than the Accuracy Metric [39].
F1-score = 2 · Precision · Recall
Precision + Recall,
where Precision is the measure of the true positives from all predicted positive cases:
Precision = true positives
true positives + false positives
and Recall the measure of true positives from all actual positives:
Recall = true positives
true positives + false negatives.
Brier-score is a measure that measures the mean squared deviation from the ideal
probabilities 1 and 0. It is used to evaluate the quality of probability estimater.
BS := 1
k
X
x∈D
(p(x) b(x))2,
where D denotes the evaluation set, p(x) the predicted probability and b(x) the lable.
Spearman’s rank correlation provides a measure of how monotonic the relationship
between two variables is, or how well a monotonic function can capture the relationship
between two variables.
ρ = 1 6 Pm
i=1(xi yi)2
m(m2 1)
Dabei i
35
ρ